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Abstract 

o 

CO ■ The Vietoris-Rips filtration is a versatile tool in topological data 

analysis. It is a sequence of simplicial complexes built on a metric 
space to add topological structure to an otherwise disconnected set of 
points. It is widely used because it encodes useful information about 
the topology of the underlying metric space. This information is of- 
^ I ten extracted from its so-called persistence diagram. Unfortunately, 

this filtration is often too large to construct in full. We show how to 
construct an 0(n)-size filtered simplicial complex on an n-point metric 
space such that its persistence diagram is a good approximation to that 
of the Vietoris-Rips filtration. This new filtration can be constructed 
QQ ■ in O(nlogn) time. The constant factors in both the size and the run- 

t^^ I ning time depend only on the doubling dimension of the metric space 

^O ■ and the desired tightness of the approximation. For the first time, this 

^^ I makes it computationally tractable to approximate the persistence di- 

f^ ■ agram of the Vietoris-Rips filtration across all scales for large data 

CN ■ sets. 

We describe two different sparse filtrations. The first is a zigzag 
filtration that removes points as the scale increases. The second is a 
(non-zigzag) filtration that yields the same persistence diagram. Both 
methods are based on a hierarchical net-tree and yield the same guar- 
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C^ ■ antees. 



1 Introduction 

There is an extensive literature on the problem of computing sparse approx- 
imations to metric spaces (see the book [29] and references therein). There 
is also a growing literature on topological data analysis and its efforts to 
extract topological information from metric data (see the survey [3] and 
references therein). One might expect that topological data analysis would 
be a major user of metric approximation algorithms, especially given that 



topological data analysis often considers simplicial complexes that grow ex- 
ponentially in the number of input points. Unfortunately, this is not the 
case. The benefits of a sparser representation are sorely needed, but it is 
not obvious how an approximation to the metric will affect the underlying 
topology. The goal of this paper is to bring together these two research ar- 
eas and to show how to build sparse metric approximations that come with 
topological guarantees. 

The target for approximation is the Vietoris-Rips complex, which has 
a simplex for every subset of input points with diameter at most some pa- 
rameter Q. The collection of Vietoris-Rips complexes at all scales yields the 
Vietoris-Rips filtration. The persistence algorithm takes this filtration and 
produces a persistence diagram representing the changes in topology corre- 
sponding to changes in scale |33j . The Vietoris-Rips filtration has become 
a standard tool in topological data analysis because it encodes relevant and 
useful information about the topology of the underlying of the underlying 
metric space |8j. It also extends easily to high dimensional data, general 
metric spaces, or even non-metric distance functions. 

Unfortunately, the Vietoris-Rips filtration has a major drawback: It's 
huge! Even the /c-skeleton (the simplices up to dimension k) has size 0(n^"*"^) 
for n points. 

This paper proposes an alternative filtration called the sparse Vietoris- 
Rips filtration, which has size 0{n) and can be computed in O(nlogn) time. 
Moreover, the persistence diagram of this new filtration is provably close to 
that of the Vietoris-Rips filtration. The constants depend only on the dou- 
bling dimension of the metric (defined below) and a user-defined parameter 
£ governing the tightness of the approximation. For the fc-skeleton, the 
constants are bounded by (^) 

The main tool we use to construct the sparse filtration is the net-tree of 
Har-Peled and Mendel [25] . Net-trees are closely related to hierarchical met- 
ric spanners [22\ [23] and their construction is analogous to data structures 
used for nearest neighbor search in metric spaces [16^ [121 [T3] . 

Outline After reviewing some related work and definitions in Sections [2] 
and El we explain how to perturb the input metric using weighted distances 
in Section [H This perturbation is used in the definition of a sparse zigzag 
filtration in Section [U i.e. one in which simplices are both added and re- 
moved as the scale increases. The full definition of the net-trees is given 
in Section [H Using the properties of the net-tree and the perturbed dis- 
tances, we prove in Section [7] that removing points from the filtration does 



not change the topology. This imphes that the zigzag filtration does not 
actually zigzag at the homology level (Subsection 18. ip . The zigzag filtration 
can then be converted into an ordinary (i.e. non-zigzag) filtration that also 
approximates the Vietoris-Rips filtration (Subsection 18. 2p . The theoretical 
guarantees are proven in Section O Subsection 19.11 proves that the resulting 
persistence diagrams are good approximations to the persistence diagram of 
the full Vietoris-Rips filtration. The size complexity of the sparse filtration 
is shown to be 0{n) in Subsection 19.21 Finally, in Section [lOl we outline 
the 0(n log n)-time construction, which turns out to be quite easy once you 
have a net-tree. 

2 Related Work 

The theory of persistent homology |2H [55] gives an algorithm for computing 
the persistent topological features of a complex that grows over time. It 
has been applied successfully to many problem domains, including image 
analysis[6j, biologv[30| \TT\. and sensor networks [19} [T8]. See also the sur- 
vey by Carlsson for background on the topological view of data [3]. It is 
also possible to consider the complexes that alternate between growing and 
shrinking in what is known as zigzag persistence |3H HI [H [27] . 

Due to the rapid blowup in the size of the Vietoris-Rips filtration, some 
attempts have been made to build approximations. Some notable examples 
include witness complexes [21 [231 IZ] as well as the mesh-based methods of 
Hudson et al. in Euclidean spaces j26]. 

The work most similiar to the current paper is by Chazal and Oudot |10] . 
In that paper, they looked at a sequence of persistence diagrams on denser 
and denser subsamples. However, they were not able to combine these di- 
agrams into a single diagram with a provable guarantee. Moreover, they 
were not able to prove general guarantees on the size of the filtration except 
under very strict assumptions on the data. 

Recently, Zomorodian [32j and Attali et al. [IJ have presented new meth- 
ods for simplifying Vietoris-Rips complexes. These methods depend only 
on the combinatorial structure. However, they have not yielded results in 
simplifying filtrations, only static complexes. In this paper, we exploit the 
geometry to get topologically equivalent sparsification of an entire filtration. 



3 Background 

Doubling metrics For a point p £ P and a set S" C P, we will write 
d(p, S) to denote the minimum distance from p to S, i.e. d(p, S) = min^g^ d{p, q). 
In a metric space A^ = (P, d), a metric ball centered at p S P with radius 
r € M is the set ball(p, r) = {q G P : d{p, q) < r}. 

Definition. The doubling constant X of a metric space A4 = {P, d) is 
the minimum num,ber of m,etric balls of radius r required to cover any hall 
of radius 2r. The doubling dimension is dim = [IgA]. A metric space 
whose doubling dimension is bounded by a constant is called a doubling 
metric. 

The spread A of a metric space A4 = {P, d) is the ratio of the largest 
to smallest interpoint distances. A metric with doubling dimension d and 
spread A has at most A'^*^'^' points. 

Simplicial Complexes A simplicial complex X is a collection of ver- 
tices denoted V{X) and a collection of subsets of V{X) called simplices 
that is closed under the subset operation, i.e. a C ip and ip £ X together 
imply a (z X. The dimension of a simplex a is \a\ — 1, where | • | denotes 
cardinality. Note that this definition is combinatorial rather than geomet- 
ric. These abstract simplicial complexes are not necessarily embedded in a 
geometric space. 

Homology In this paper we will use simplicial homology over a field (see 
Munkres [28] for an accessible introduction to algebraic topology). Thus, 
given a space X, the homology groups Ili(X) are vector spaces for each i. 
Let H^,(X) denote the collection of these homology groups for all i. 

The star subscript denotes the homomorphism of homology groups in- 
duced by a map between spaces, i.e. f : X ^ Y induces /* : H*(X) -^ 
H*(y). We recall the functorial properties of the Homology operator, H*(-). 
In particular, (/ o g)^ = f^ o g^ and idx,t = idH,(x)i where id indicates the 
identity map. 

Persistence Modules and Diagrams A filtration is a nested sequence 
of topological spaces: Xi C X2 Q ■ ■ ■ Q Xn- If the spaces are simplicial com- 
plexes (as with all the filtrations in this paper), then it is called a filtered 
simplicial complex. 



A persistence module is a sequence of Homology groups connected by 
homomorphisms : 

H,(Xi) ^ H,(X2) ^ > H,(X„). 

The homology functor turns a filtration with inclusion maps Xi M- X2 ^-)- 
• • • into a persistence module, but as we will see, this is not the only way to 
get one. 

One can also consider zigzag filtrations, which allow the inclusions to 
go in both directions: Xi C X2 5 X^ C • • • . The resulting module is called 
a zigzag module. 

H,(Xi) ^ H,(X2) ^ H,(X3) ^ • • • . 

The persistence diagram of a persistence module is a multiset of points 
in (M U {00})^. Each point of the diagram represents a topological feature. 
The X and y coordinates of the points are the birth and death times of the 
feature and correspond to the indices in the persistence module where that 
feature appears and disappears. Points far from the diagonal persisted for 
a long time, while those "non-persistent" points near the diagonal may be 
considered topological noise. By convention, the persistence diagram also 
contains every point (x, x) of the diagonal with infinite multiplicity. 

Given a filtration J-", we let DJ- denote the persistence diagram of the 
persistence module generated by T. The persistence algorithm computes 
a persistence diagram from J- [33]. It is also known how to compute a 
persistence diagram when J^ is a zigzag filtration [^I27j. 

Approximating Persistence Diagrams Given two filtrations T and 
Q, we say that the persistence diagram DJ^ is a c-approximation to the 
diagram DQ if there is a bijection (p : DJ-" — )• DQ such that for each p E DJ^, 
the birth times of p and (/>(p) differ by at most a factor of c and the death 
times also diff'er by at most a factor of c. The reader familiar with stability 
results for persistent homology [15117] will recognize this as bounding the £00- 
bottleneck distance between the persistence diagrams after reparameterizing 
the filtrations on a log-scale. 

We will make use of two standard results on persistence diagrams. The 
first gives a sufficient condition for two persistence modules to yield identical 
persistence diagrams. 



Theorem 3.1. [Persistence Equivalence Theorem ^20, page 159]] Consider 
two sequences of vector spaces connected by homomorphisms (j)i : Ui -^ Vii 

■ ■ ^ Vr,-^ ^ Vr 




n-1 ^ Vn 



Un-1 ^ Un 



If the vertical maps are isomorphisms and all squares commute then the 
persistence diagrams defined by the Ui is the same as that defined by the Vi. 

We prove approximation guarantees for persistence diagrams using the 
following lemma, which is a direct corollary of the Strong Stability Theorem 
of Chazal et al. [7J rephrased in the language of approximate persistence 
diagrams. 

Lemma 3.2. [Persistence Approximation Lemma] For any two filtrations 
A = {Aa}a>o o,nd B = {Ba}a>o, if A^^^ C Ba C Aca for all a >0 then the 
persistence diagram DA is a c- approximation to the persistence diagram of 
BB. 

Contiguous Simplicial Maps Contiguity gives a discrete version of ho- 
motopy theory for simplicial complexes. 

Definition. Let X and Y be simplicial complexes. A simplicial map 

f : X —^ Y is a function that maps vertices of X to vertices of Y and 
fi^) — Ui-eo- /(^) ^^ ^ simplex of Y for all a £ X. 

A simplicial map is determined by its behavior on the vertex set. Con- 
sequently, we will abuse notation slightly and identify maps between vertex 
sets and maps between simplices. When it is relevant and non-obvious, we 
will always prove that the resulting map between simplicial complexes is 
simplicial. 

Definition. Two simplicial maps f,g : X ^>-Y are contiguous if f{cr) U 
g{a) G Y for all a e X. 

Definition. For any pair of topological spaces X CY , a map f : Y ^ X 
is a retraction if f{x) = x for all x £ X . Equivalently, f o i = \dx where 
i : X ^^ Y is the inclusion map. 

The theory of contiguity is a simplicial analogue of homotopy theory. 
If two simplicial maps are contiguous then they induce identical homomor- 
phisms at the homology level |28t §12]. The following lemma gives a homol- 
ogy analogue of a deformation retraction. 



Lemma 3.3. Let X and Y he simplicial complexes such that X Q Y and 
let i : X ^-^ Y be the canonical inclusion map. If there exists a simplicial 
retraction n : Y ^ X such that ion and idy are contiguous, then i induces 
an isomorphism i^, : H=k(X) — > H:^(y) between the corresponding homology 
groups. 

Proof. Since i o vr and idy are contiguous, the induced honioniorphisnis (i o 
vr)^ : H*(y) -^ H*(y) and idy,, : H*(y) -^ H*(y) are identical [28, §12]. 
Since idy^ = [i o ii)^ = i^ o tt^ is an isomorphism, it follows that i^ is 
surjective. 

Since vr is a retraction, tt o i = idx and thus (vr o i)^^ : H*(X) — t- H*(X) 
and idx* : H*(X) — > H*(X) are identical. Since idx* = (vr o i)^ = -jt^ o i^ is 
an isomorphism, it follows that i^ is injective. 

Thus, i^ is an isomorphism because it is both injective and surjective. D 

4 The Relaxed Vietoris-Rips Filtration 

In this section, we relax the input metric so that it is no longer a metric, but 
it will still be provably close to the input. The new distance adds a small 
weight to each point that grows with a. The intuition behind this process 
is illustrated in Figure [TJ 




Figure 1: Top; Some points on a line. The white point contributes httle to the union 
of a-balls. Bottom: Using the relaxed distance, the new a-ball is completely 
contained in the union of the other balls. Later, this property is the key that 
allows us to remove this point without changing the topology. 



Throughout, we assume the user-defined parameter e < 2 is fixed. Each 
point p is assigned a deletion time tp G M>o. The specific choice of tp will 
come from the net-tree construction in Section [6j For now, we will assume 



the deletion times are given, assuming only that they are nonnegative. The 
weight Wi{a) of point p at scale a is defined as 



Wp{a) 



' if a < (1 - 2e)tp 

i(a- (1 -2e)ip) ii {1 - 2e)tp < a < tp 
ea if tp < a 



The relaxed distance at scale a is defined as 

da(p, q) = d(p, q) + Wp{a) + Wq{a). 

For any pair p^q £ P, the relaxed distance dQ,(p, g) is monotonically non- 
decreasing in a. In particular, d^ > dg = d for all a > 0. Although 
distances can grow as a grows, this growth is sufficiently slow to allow the 
following lemma which will be useful later. 

Lemma 4.1. Ifda{p,q) < a < /3 then d^(p, g) < /3. 

Proof. The weight of a point is |-Lipschitz in a, so u;p(/3) < Wp{a) + ^\(3 — a\, 
and similarly, Wq{l3) < Wq{a) + ||/3 — a\. So, 

d/3(p, q) = d(p, q) + Wp{j3) + Wq{(5) 

< d{p, q) + Wp{a) + Wq{a) + (/3 - a) 
= da{p,q) + 13 -a 

D 

Given a set P, a distance function d' : P x P —^ R, and a scale parameter 
a G M, we can construct a Vietoris-Rips complex 

VR(P, d', a) = {a CP : d'(p, g) < a for all p,qe a}. 

The Vietoris-Rips complex associated with the input metric space (P, d) 
is TZa = VR(P, d, a). The relaxed Vietoris-Rips complex is TZa = 
YR{P,da,a). 

By considering the family of Vietoris-Rips complexes for all values of 
a > 0, we get the Vietoris-Rips filtration, 7^ = {'JZa}a>o- Similarly, 
we may define the relaxed Vietoris-Rips filtration, TZ = {TZa}a>o- 
Lemma 14.11 implies that TZ is indeed a filtration. The filtrations TZ and 
TZ are very similar. The following lemma makes this similarity precise via a 
multiplicative interleaving. 



Lemma 4.2. For all a > 0, V. " C TZa Q Ti-a, where Eq 



2e 
\-2e- 



Proof. To prove inclusions between Vietoris-Rips complexes, it suffices to 
prove inclusion of the edge sets. For the first inclusion, we must prove that 
for any pair p, q, if d(p, q) < jr— then da{p, q) < a. Fix any such pair p, q. 
By definition, Wp{a) < ea and 'Wq{a) < ea. So, 



da{p, q) = d(p, q) + Wp{a) + Wq{a) < h 2ea = a. 

1 + en 



For the second inclusion, d^ > d. So, if da{p,q) < a then d{p,q) < a 
as well. Thus any edge of TZa is also an edge of TZa- D 

5 The Sparse Zigzag Vietoris-Rips Filtration 

We will construct a sparse subcomplex of the relaxed Vietoris-Rips complex 
TZa that is guaranteed to have linear size for any a. In fact, we will get 
a zigzag filtration that only has a linear total number of simplices, yet its 
persistence diagram is identical to that of the relaxed Vietoris-Rips filtration. 
We define the open net J\fa at scale a to be the subset of P with deletion 
time greater than a, i.e. 

■Afa = {p & P ■ tp > a}. 

Similarly, the closed net at scale a is 

ATq, = {p G P : tp > a}. 

The sparse zigzag Vietoris-Rips complex Qa at scale a is just the 
subcomplex of TZa induced on the vertices of Ma- Formally, 

Qa = W£TZa:aCMa} = VR(A/;, d„, a). 
We also define a closed version of the sparse zigzag Vietoris-Rips complex: 

Qa=\R{J7a,da,a). 

Note that if a / tp for all p € P then TVq, = Ma and Qa = Qa ■ 

The complexes TZa, TZai Qa, and Qa are well-defined for all q > 0, how- 
ever, they only changes at discrete scales. Let A = {ajjjgN be an ordered, 
discrete set of nonnegative real numbers such that tp & A for all p G P and 
a G ^ for any pair p,q such that a = da{p,Q)- That is, A contains every 



scale at which a combinatorial changes happens, either a point deletion or an 
edge insertion. This implies that Mai ~ '^at+i ^-nd thus, using Lemma 14. H 
that Qa, C Q„^^^. 

The sparse Vietoris-Rips complexes can be arranged into a zigzag filtra- 
tion Q as follows. 



Qa 



Qa^ ^ Qa 



Q. 



02 



We will return to Q later as it has some interesting properties. However, 
at this point, it is underspecified as we have not yet shown how to compute 
the deletion times for the vertices. The next section will fill this gap. 

6 Hierarchical Net-Trees 

The following treatment of net-trees is adapted from the paper by Har-Peled 
and Mendel [25j . 

Definition. A net-tree of a metric Ai = {P, d) is a rooted tree T with 
vertex v £ T having a representative point rep(w) € P. There are n = \P\ 
leaves, each represented by a different point of P. Each non-root vertex v £ T 
has a unique parent par(w). The set of vertices with the same parent v are 
called the children ofv, denoted chiid{v) . If child{v) is nonempty then for 
some u € child(v), rep(u) = rep(u). The set Pv '^ P denotes the points 
represented by the leaves of the subtree rooted at v. Each vertex v £ T has 
an associated radius rad(f ) satisfying the following two conditions. 

1. Covering Condition: Py C ball(rep(f),rad(f)), and 

2. Packing Condition: if v is not the root, then 

Pnball(rep(t;),ii'prad(par(f))) C P^, 

where Kp (the "p" is for "packing") is a constant independent of M. 
and n. 




Figure 2: A net-tree is built over the set points from Figure [T] Each level of the tree 
represents a sparse approximation to the original point set at a different scale. 
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The radii of the net tree nodes are always some constant times larger 
than the radius of their children. Simple packing arguments guarantee that 
no node of the tree has more than A '^•' children, where lambda is the 
doubling constant of the metric. The whole tree can be constructed in 
O(nlogn) randomized time or 0(n log A) deterministically [2S]. Moreover, 
it is important to note the construction does not require that we know the 
doubling dimension in advance. 

Given a net-tree T for A^ = {P, d) and a point p £ P, let Vp denote the 
least common ancestor of all vertices in T represented by p. For each p G P 
the deletion time tp is defined as 

tp = ^.-^_^ x rad(par(np)). 

This is just the radius of the parent of Vp with a small scaling factor included 
for technical reasons. When the scale a reaches tp, we remove point p from 
the (zigzag) filtration. 

For a fixed scale a € M, the set Ma is a subset of points of P induced by 
the net-tree. The sets J\fa are the nets of the net-tree. For any a it satisfies 
a packing condition and a covering condition as defined in the following 
lemma. 

Lemma 6.1. Let M. = (-P, d) he a metric space and let T he a net-tree for 
M. For all a >, the net Ma induced by T at scale a satisfies the following 
two properties. 

1. Covering Condition: For all p € P, d{p,Ma) < e(l — 2e)a. 

2. Packing Condition: For all distinct p,q G Ma, d(p, g) > Kpe{l — 

2e)a 

Proof. First, we prove that the covering condition holds. Fix any p ^ P. 
The statement is trivial if p S Ma so we may assume that tp < a. 

Let V be the lowest ancestor of p in T such that rep(T;) S Ma- Let u be 
the ancestor of p among the children oiv. If g = rep(ti) then tq = -^^2)- By 
our choice of v, q ^ Ma and thus tg < a. It follows that rad(t') < e(l — 2e)a. 
Thus, d{p,Ma) < d{p,Tep{v)) < rad(z;) < e(l - 2e)a. 

We now prove that the packing condition holds. Let p, q be any two 
distinct points of Ma- Without loss of generality, assume tp < tq. Thus, 
q ^ Pvp, where (as before) Vp = lca{n € T : rep(ti) = p}. Since p € Ma, 
Oi <tp = (i'^2 ^ • Therefore, using the packing condition on the net-tree 
T, d{p,q) > Kpiad{paT{vp)) > Kpe{l - 2e)a. D 
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A subset that satisfies this type of packing and covering conditions is 
sometimes referred to as a metric space net (not to be confused with a 
range space net) or, more accurately, as a Delone set [14j. 

7 Topology-preserving sparsification 

In this Section, we make the intuition of Figure [1] concrete by showing that 
deleting a vertex p (and its incident simplices) from the relaxed Vietoris-Rips 
complex TZtp does not change the topology. 

For any a > 0, we define the "projection" of P onto Ma as 

( p ^ iipeMa 

'^a[P) = \ argmin da (p, g) otherwise. \^) 

I qeAfa 

The following lemma shows that the distance from a point to its projection 
is bounded by the difference in the weights of the point and its projection. 

Lemma 7.1. For all p G P, d{p,-Ka{p)) < Wp{a) — w^^(^p^{a). 

Proof. Fix any p £ P. We first prove that if d{p,q) < Wp{a) — Wq{a) for 
some q G Ma, then it holds for q = iTaip)- ^^ ^^ have such a q, then the 
definitions of d^ and vTq imply the following. 

d{p,TTa{p)) = da{p,7Ta{p)) - Wp{a) - W^^(^p){a) 

< da{p,q) - Wp{a) - w^^(^p){a) 
<d{p,q)+Wq{a) -w^„(p)(a) 
<Wp{a) -u;^^(p)(a). 

So, it will suffice to find a q € Ma such that d{p,q) < Wpi^a) — Wq^a). If 
p € Ma then this is trivial. So we may assume p ^ Ma and therefore tp < a 
and w)p(a) = ea. 

Let u G T be the ancestor of p such that rad(n) < j^ and rad(par(u)) > 
j^- Let q = rep(ti). Since 

a 



to > —, -rad(par(n)) > , 

^ - e(l -2e) ^^ ^ " - \-2e 



it follows that Wq(a) = and that q G Ma- Finally, since p G Pu, d{p,q) < 
rad(u) < ea = Wp{a) — Wq{a). D 
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By bounding the distance between points and their projections, we can 
now show that distances in the projection do not grow. 

Lemma 7.2. For all p,q G P and all a >0, da{TTa{p),Q) ^ "^aiPiQ)- 

Proof. The bound foUows from the definition of d^, the triangle inequahty, 
and Lemma l7. 11 D 

Lemma 7.3. Let a > be a fixed constant. Let X be a set of points such that 
A/'q Q X Q P and let K = VR(X, dQ,a). The inclusion map i : Qa ^-> K 
induces an isomorphism at the homology level. 

Proof. The map K -^ Qa induced by -Ka is a retraction because Qa ^ K 
and tTq is a retraction onto Afa, the vertex set of Qa- By Lemma 13. 3| it 
win suffice to prove that tTq is simpficial and that i o tTq, is contiguous to the 
identity map on K. Since Qa and K are Vietoris-Rips complexes, it will 
suffice to prove these facts for the edges: 

1. TTc is simplicial: for all p, (7 G X,iida{p,q) < a then <!„ (vTq (p) , vTq, (g) ) < 
a, and 

2. ioTTa and idi^ are contiguous: for all p,q £ X, if da{p, q) l£ ex then all 
six edges of the tetrahedron {p, q, T^a{p),'^aiQ)} are in K. 

The first statement follows from two successive applications of Lemmas 17.21 
The second statement follows from Lemma 17.11 for the edges {p, iTa [p) } and 
{q,TTa{q)} and from Lemma 17.21 for the other edges. D 

Corollary 7.4. For all a £ A, the inclusions f : Qa "^ Qa, 9 '■ Qa "^ T^a, 
and h : Qa ^^ TZa induce isomorphisms at the homology level. 

Proof. The inclusions / and g induce isomorphisms by applying Lemma [7.31 
with X = JJa and X = P respectively. Composing the inclusions, we get 
that g = ho f. Thus, at the homology level, we get /i^. = g^r ° f^^ is also an 
isomorphism. D 

8 Straightening out the Zigzags 

In this section, we show two different ways in which zigzag persistence may 
be avoided. First, in Subsection lS.H we show that the sparse zigzag filtration 
Q does not zigzag at the homology level. Then, in Subsection 18.21 we show 
how to modify the zigzag filtration so it does not zigzag as a filtration either. 
The advantage of the non-zigzagging filtration is that it allows one to use 
the standard persistence algorithm, but it has larger size in the intermediate 
complexes. As we will see in Subsection 19.21 the total size is still linear. 
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8.1 Reversing Homology Isomorphisms 

The backwards arrows in the zigzag filtration Q all induce isomorphisms. At 
the homology level, these isomorphisms can be replaced by their inverses to 
give a persistence module that does not zigzag. That is, the zigzag module 

• • • ^ H,("Q,) ^ H,(Q,) ^ H,(Q^) ^ H,(Q^) ^ • • • 

can be transformed into 

> H,(Q«) 4 R,{Qa) ^ H,(Q;3) 4 H,(Q/3) ^ • • • • 

The latter module implies the existence of another that only uses the closed 
sparse Vietoris-Rips complexes: 

•••^H,(Q„)^H,(Q^)^--- . 

Note that this module does away with the duplication of indices needed 
for the zigzag. In these various transformations, we have only reversed 
or concatenated isomorphisms, thus we have not changed the rank of any 
induced map H^,(Qa) — > H*(Q^). As a result the persistence diagram DQ 
is unchanged. 

This is novel in that we construct a zigzag filtration and we apply the 
zigzag persistence algorithm, but we are really computing the diagram of 
a persistence module that does not zigzag. The zigzagging can then be 
interpreted as sparsifying the complex without changing the topology. 

8.2 A Sparse Filtration without the Zigzag 

The preceding subsection showed that the sparse zigzag Vietoris-Rips filtra- 
tion does not zigzag as a persistence module. This hints that it is possible 
to construct a filtration that does not zigzag with the same persistence dia- 
gram. Indeed, this is possible using the filtration 

k 

S = {SaJa^eA, where Sa^ = |J Q„^. 

We first prove that H*(Qaj.) and H*(5aj.) are isomorphic. 

Lemma 8.1. For all a^ € A, the inclusion h : Qa^ ^-J- Sa^. induces a 
homology isomorphism. 
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Proof. We define some intermediate complexes that interpolate between Q^^ 
and Sa^^ ■ 



Ti,k = \J Qa 



■3 



In particular, we have that Q^^, = T^^fc and Sa,. = Ti^k- The map h can be 
expressed as h = hio ■ ■ ■ o /ifc-i, where hi : Tj+i^^ M- Tj ^ is an inclusion. It 
will suffice to prove that each hi induces an isomorphism at the homology 
level for each i = 1 ... /c — 1. By Lemma |3.3^ it will suffice to show that the 
projection tTq. : Ti^k — ^ Ti+i^k is a simplicial retraction and hi o Tr^j and idT, j. 
are contiguous. 

Let a G Tj ^ be any simplex. So, a G Qa^ for some integer j such that 
i < j < k. 

First, we prove that tTq^ is a retraction. If cr E Tj+i^^ then j > i + 1. So, 
0" ^ ATq , C A/'a- and thus tTq^ (a) = a because vTa^ is a retraction onto AAq- by 
definition when viewed as a function on the vertex sets. 

Second, we show that vTa. is a simplicial map from Ti^k to Tj+i^^. Since 
it is a retraction, it only remains to show that VTa. (a) € Tj^^ when a € Tj^^ \ 
Tj+i^fc, i.e. when j = i. In this case, TTa^{a) G Qa^ because TTa^ : Qa.. -^ Qa^ is 
simplicial (as shown in the proof of Lemma 17131) . Since Qa^ ^ Qa^^^ C Tjjt, 
it follows that T^aiio') G Tj^^ as desired. 

Last, we prove contiguity. We need to prove that a U T^a^icr) G Tj^^- If 
j > i, then a U Tra; (o") = o" € Tj^fc as desired. If z = j, then o" U tTq^ {a) S Qa^ 
as shown in the proof of Lemma 17.31 Since Qa^ C Ti^k, it follows that 
a U TTai{(T) G 7i,fc as desired. D 

Theorem 8.2. The persistence diagrams of Q and S are identical. 

Proof. For any aj,aj+i € A, we get the following commutative diagram 
where all maps are induced by inclusions. 



H^QaJ-^ H,(Q„J -H,(Q,,^,; 




H*('5aJ 



Lemma 18.11 and Corollary 17.41 show that the indicated maps are isomor- 
phisms. As in Section [8?T1 we reverse the isomorphism H=i,(Qa-) — > H*(Qa,) 
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to get the following diagrams, which also commutes. 

Therefore, by the Persistence Equivalence Theorem, DQ = DS. D 

9 Theoretical Guarantees 

There are two main theoretical guarantees regarding the sparse Vietoris-Rips 
filtrations. First, in Subsection 19.11 we show that the resulting persistence 
diagrams are good approximations to the true Vietoris-Rips filtration. Sec- 
ond, in Subsection 19.21 we show that the filtrations have linear size. 



9.1 The Approximation Guarantee 

In this subsection we prove that the persistence diagram of the sparse 
Vietoris-Rips filtration is a multiplicative (1 -|- e)-approximation to the per- 
sistence diagram of the standard Vietoris-Rips filtration. The approach has 
two parts. First, we show that the relaxed filtration is a multiplicative 
(1 -|- e)-approximation to the classical Vietoris-Rips filtration. Second, we 
show that the sparse and relaxed Vietoris-Rips filtrations have the same per- 
sistence diagrams, i.e. that DQ = DTZ. By passing through the filtration TZ, 
we obviate the need to develop new stability results for zigzag persistence. 

Theorem 9.1. For any metric space M. = (P, d), the persistence diagrams 
of the corresponding sparse Vietoris-Rips filtrations Q = Q{M) and S = 
S{M) both yield (1 -|- e)- approximations to the persistence diagram of the 
Vietoris-Rips filtration TZ = TZ{M). 

Proof. By Lemma [4.2l we have a multiplicative (l-l-e)-interleaving between 
TZ and TZ. Thus, the Persistence Approximation Lemma implies that T)TZ is 
a (1 -|- e)-approximation to T)TZ. 

We have shown in Theorem 18.21 that DQ = D5, so it will suffice to 
prove that DQ = D7^. The rest of the proof follows the same pattern as 
in Theorem 18.21 For any aj,aj+i G A, we get the following commutative 
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diagram induced by inclusion maps. 





Corollary 17.41 implies that many of these inclusions induce isomorphisms at 
the homology level (as indicated in the diagram). As a consequence, the 
following diagram also commutes and the vertical maps are isomorphisms. 



H,(7^aJ — -H,(7^a,^J 

So, the Persistence Equivalence Theorem implies that DQ = T)1Z as desired. 

D 

9.2 The Linear Complexity of the Sparse Filtration 

In this subsection, we prove that the total number of simplices in the sparse 
Rips filtration is only linear in the number of input points. We start by 
showing that the graph of all edges appearing in the filtration has only a 
linear number of edges. 

For a point p € P, let E{p) be the set of edges from p to a point whose 
removal time is at least as large as that of p: 

E{p) = {q£P:tp<tg and {p, q) G QtJ. 

To compute the filtrations Q and S, it suffices to compute E{p) for each 
p G P. In fact Soo is just the clique complex on the graph of all edge (p, q) 
such that q G E{p). 

Lemma 9.2. Given a set of n points in a metric space A4 = {P, d) with 
doubling dimension d, the cardinality \E{p)\ is at most - for each p E P. 

Proof. Let A{E{p)) denote the spread of E[p). Since E[p) is a finite met- 
ric with doubling dimension at most d, the number of points is at most 
A(£;(p))0('^). So, it will suffice to prove that for all peP, A{E{p)) = 0(i). 
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The definition of E{p) implies tliat E{p) C J\ff^ and so by Lemma 16. H 
the nearest pair in E{p) are at least Kpe{l — 2e)tp apart. For q € E{p), since 
(p, q) G Qtp, d{p, q) < dtp{p, q) < tp. It follows that the farthest pair in E{p) 
are at most 2tp apart. So, we get that A{E{p)) < ^ e(i-2e)t ~ ^(l) ^^ 
desired. D 



We see that the size of the graph in the filtration is governed by three 
variables: the doubling dimension, d; the packing constant of the net-tree, 
Kp\ and the desired tightness of the approximation, e. The preceding Lemma 
easily implies the following bound on the higher order simplices. 

Theorem 9.3. Given a set of n points in a metric space A4 = (P, d) with 
doubling dimension d, the total number of k- simplices in the sparse Vietoris- 
Rips filtrations Q and S is at most (-) n. 

10 An algorithm to construct the sparse filtration 

The net-tree defines the deletion times of the input points and thus deter- 
mines the perturbed metric. It also gives the necessary data structure to 
efficiently find the neighbors of a point in the perturbed metric in order 
to compute the filtration. In fact, this is exactly the kind of search that 
the net-tree makes easy. Then we find all cliques, which takes linear time 
because each is subset of E{p) for some p & P and each E{p) has constant 
size. 

As explained in the Har-Peled and Mendel paper, it is often useful to 
augment the net-tree with "cross" edges connecting nodes at the same level 
in the tree that are represented by geometrically close points. The set of 
relatives of a node u € T is defined as 

Rel(n) = {v £ T :rad(f) < rad(u) < rad(par(t')) and 
d(rep(n),rep(f )) < Crad(n)}, 

where C is a constant bigger than 3|j The size of Rel(n) is a constant using 
the same packing arguments as in Lemma 19.21 

This makes it easy to do a range search to find the points of E{p). In 
fact, we will find the slightly larger set E'{p) = Mtp nball(p, tp). The search 
starts by finding u the highest ancestor of Vp whose radius is at most some 



*The precise value of C depends on some constants chosen in the construction of the 
net-tree and can be extracted from the HM paper. For our purposes, we only need the 
fact that it is bigger than 3. 
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Figure 3: To find the nearby points at or above tlie level of Up in the net-tree only re- 
quires traveling up a constant number of levels and then searching the relative 
trees. 



fixed constant times tp. Since the radius increases by a constant factor on 
eacli level, this is only a constant number of levels. Then the subtrees rooted 
at each v € Rel(n) are searched down to the level of Vp. Thus, we search 
a constant number of trees of constant degree down a constant number of 
levels. The resulting search finds all of the points of E{p) in constant time. 
Since the work is only constant time per point, the only superlinear work 
is in the computation of the net-tree. As noted before, this requires only 
O(nlogn) time. 



11 Conclusions and Directions for Future Work 

We have presented an efficient method for approximating the persistent ho- 
mology of the Vietoris-Rips filtration. Computing these approximate per- 
sistence diagrams at all scales has the potential to make persistence-based 
methods on metric spaces tractable for much larger inputs. 

Adapting the proofs given in this paper to the Cech filtration is a simple 
exercise. Moreover, it may be possible to apply a similar sparsification to 
complexes filtered by alternative distance-like functions like the distance to 
a measure introduced by Chazal et al. [9]. 

Another direction for future work is to identify a more general class of 
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hierarchical structures that may be used in such a construction. The net- 
tree used in this paper is just one example chosen primarily because it can 
be computed efficiently. 

The analytic technique used in this paper may find more uses in the 
future. We effectively bounded the difference between the persistence dia- 
grams of a filtration and a zigzag filtration by embedding the zigzag filtration 
in a topologically equivalent filtration that does not zigzag at the homology 
level. It may be that other zigzag filtrations can be analyzed in this way. 
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